Module 9 Lecture

Computational Theory

The heart of the study of computer science.

What can computers do?
What can they do only poorly? (ex: TSP)
What can they not do? (ex: HP)

The technology is irrelevant. Instead...

Abstract mathematical model of computer.
Program as a function, a matching between input and output
Domain: Input is all possible finite strings
Range: Output is to some members of the same set

Input, all possible strings ----> p(input) ----> output, some strings

Inputs cause ...

Normal halt
Loop forever
Crash while running

The latter do not map to output. p(input) is UNDEFINED in those cases. The machine to execute these programs is called the TURING MACHINE, TM, is a mathematical object described as follows.

Input alphabet, Σ: usually 0, 1 plus special symbols as needed.
Output alphabet: Γ
Tape: a left-bounded endless tape with cells to hold single symbols, initially the tape begins with only the input string, followed by endless blanks "b"
Tape Head: Read contents, Replace Content and Move head one cell
Set of States that the program can be in. Initial state is distinguished in advance, we'll call it state #1. 0 or more HALT states (usually designated as such, or seen as a state from which there is no transtion out).
Program: set of rules written as 5-tuples,
(currentState, readInput, writeOutput, moveDirection, newState)

Programs terminate by entering into a Halt state, or crashing (when reading an invalid input from a given state) or loop forever.

EXAMPLE

Program to determine whether the input tape contains all of the same input symbols.

STATE DIAGRAM VIEW OF THE TM

Output: Blank tape: Valid input (TM accepts the tape) -- TMs are often used to accept an input
Crash: Invalid input -- the TM has rejected the input by crashing

Demo on several input tapes on the board.

5-TUPLES REPRESENTATION OF THE TM

Shorthand symbolism:

List the machine as a set of rules:

(currentState, readInput, writeOutput, moveDirection, newState)

The machine has rules,

       (1,0,b,R,2)
       (1,1,b,R,4)
       (2,0,b,R,2)
       (2,b,b,R,3)
       (4,1,b,R,4)
       (4,b,b,R,3)

which lead to its program set,

{(1,0,b,R,2), (1,1,b,R,4), (2,0,b,R,2), (2,b,b,R,3), (4,1,b,R,4), (4,b,b,R,3)}

SERIAL NUMBER OF THE TM

Unary def. The unary representation of decimal 1,000,000 needs only one type of symbol, but that symbol is repeated a million times.

Let the states be designated in UNARY NOTATION WITH ZEROES ... meaning n is encoded as a string of n zeroes as below.

      state 1 = 0
      state 2 = 00
      state 3 = 000  
      ...
      state12 = 000000000000
      etc.

Let the input and output characters in an alphabet of n symbols be represented likewise, with an appropriate coding scheme. For example, in all of our examples, the standard input and output sets will be,
Σ = Γ = {0, 1, b}
which leads to
```
       0 = 0
       1 = 00
       b = 000
```
If we chose to include other special characters, we could define them likewise. For example,
Σ = {0, 1, b, #, $}
gives
```
       0 = 0
       1 = 00
       b = 000
       # = 0000
       $ = 00000
```
for the input characters.
Let the direction be represented:
```
       Left = 0
       Right = 00
```

Then separate the pieces of the 5-tuple with spacing 1's. SO .... (1,0,b,R,2) becomes:

0 1 0 1 000 1 00 1 00

Then, list all the rules, separated by 11:

0101000100100 11 0100100010010000 11 etc.

Then surround the entire collection with 111's and we have a Turing Machine definition in binary. This is often called the SERIAL NUMBER of the TM.

HOMEWORK

This Turing Machine was designed to spot an input string of either all 1's or all 0's. When the machine reached the final state (halt state) then the string on the tape contained all blanks. This indicated the input string indeed contained all 0's or 1's. Let's change the machine slightly so that it leaves the original string alone and gives us a "F" for false and a "T" for true.

Let's begin by looking at the input tape and deciding how and where we should place the "F" or "T". The first position on the tape looks like a logical place to store the result.

MORE MACHINES

EVEN-EVEN

This machine will be a validator to insure that the input string on the tape has an even number of 0's and an even number of 1's. They can be arranged in any way, as long as the count of 0's and the count of 1's are both even.

The machine begins on the first symbol of the string and will leave the tape unchanged and will enter into state 5 (halt) only if the string is "accepted."

A Turing Machine such as this is said to be a generator of a language. The language generated by the TM is the set of all valid strings (each called a sentence) accepted by the machine. For example, the first few sentences in the language of EVEN-EVEN are:

b (empty string) 00 11 0000 0011 0110 0101 1001 1100 1010 1111

ADD

This machine actually performs an arithmetic computation. The input tape will be composed of two positive integers, represented in unary. In other words, the values are represented as,

1 = 1 2 = 11 3 = 111 ... 12 = 111111111111 etc.

The two integers are separated by a "+" symbol. After running the machine, the result on the tape is the sum of the two values.

MONUS

The monus operation is defined as,

m monus n = min(m - n, 0)

Thus, monus is integer subtraction where the smallest possible result is 0. The input tape will contain an integer m, represented in unary, separated by a "-" symbol, then an integer n, represented in unary. After running the machine, the result in the tape is m monus n. For example, the tape

11111-111bbbb
^

would become

11bbbbbbbbbbb
 ^

while the tape

111-11111bbbb
^

would become

bbbbbbbbbbbbb
 ^

DELETE

This machine will delete the symbol on the input tape which the read head is positioned upon at the beginning of execution. In other words, the tape

01110001bbbb
   ^

would become

0110001bbbb
    ^

INSERT new

This machine will shift everything from current position to the right and insert the symbol new into the current position. For example, if we wished to insert a $ into a string, INSERT $ would modify the tape

01110001bbbb
   ^

into

011$10001bbbb
    ^

COPY

This machine will make a copy of the string on the tape, separating it from the original string by a single blank. For example, the input tape

01110001bbbbbbbbbbbbbb
^

would become

01110001b01110001bbbbb
         ^

UTM

The Universal Turing Machine. This machine begins with the serial number for any TM, P, and the input, x, to P and then simulates the output for P(x).

In other words, UTM is an interpreter for Turing Machines. It will simulate the action of any other TM. Such a machine would be difficult to describe in state diagrams, but a proof exists which verifies that the UTM exists. In other words, such a machine can be built and we can assume its existence in any of our other discussions, though the proof of this machine's existence is far beyond the scope of our course.

Church-Turing Thesis

It's not surprising that we can do these computations .... and that people have figured out how to do sorting, deleting data, copying data, all kinds of math, etc, using Turing Machines. The TM can do anything that's computable (probably).

"It is believed that there are no functions that can ever be defined by humans, whose calculation can be described by some well-defined mathematical algorithm that people can be taught to perform, that cannot be computed by a TM. Thus the TM is believed to be the ultimate calculating mechanism"

This is a Thesis, not a Theorem which means it is not provable in normal ways. For example, it is not easy to deal with terms like: "Can Ever Be Described By Humans" or "Algorithms that People can be Taught to Perform" since there are no AXIOMS that include the notion of "PEOPLE" and what they we can be taught to do. This is very vague!

Imagine that we had no AXIOMS about Algebra. In that case, we couldn't prove that 2(x+y) = 2x+2y; instead we could make empirical observations such as the following:

x    y   x+y   2(x+y)   2x   2y   2x+2y
--- --- ----- -------- ---- ---- -------
0    0    0       0      0    0      0 
1    1    2       4      2    2      4
2    2    4       8      4    4      8
5    3    8      16     10    6     16
100  4   104    208    200    8    208

Wow! Every time we tried an example, 2(x+y) seemed to be the same as 2x+2y. Is this a proof? Not really. Maybe we were very lucky and there might be cases where the observation won't work out. But it makes us feel good about accepting the notion that 2(x+y) = 2x+2y, without having the axioms of algebra to give us an irrefutable proof.

The Church-Turing thesis has been examined in this empirical manner. No one has ever been able to define a function whose calculation can be described by some well-defined mathematical algorithm that people can be taught to perform, that cannot be computed by a TM. Perhaps, someday someone may find a task that humans agree is an algorithm but that cannot be executed by a TM. But not so far. And most researchers believe that it will never happen.

Thus, the Church-Turing thesis is accepted as an article of faith by most computer theorists.

TURING MACHINES and COMPUTABILITY

The TM can be used as a model to determine whether something IS computable to start with.

PROBLEMS THAT CANNOT BE SOLVED. EVER.

1. HALTING PROBLEM

It would be helpful if a programmer had a utility tool which would scan a program they are writing and some input string and could alert the programmer:

This program will never halt on this input

--or--

This program will eventually halt on this input.

Can the program, H, be written to compute this output?

THEOREM

H does not exist.

PROOF

Proof by contradiction. Assume H exists.

H is a machine which uses two inputs on the tape:

The serial number of the program, P, to be checked
The input, s, to P

H always halts and correctly answers:

Y, if P(s) eventually will halt
N, if P(s) will run forever

Use the powerful machine H to build the following machine, H':

input: The serial number of the the program, P, to be checked
algorithm:
1. Use the COPY TM to duplicate the initial string. For example, the program P might be represented as
1110101010010111
on the input tape. In this case, the tape is modified to
1110101010010111#1110101010010111bbbb
Thus, the tape now contains the serial number of the machine P to represent a machine, and the serial number of P to represent input to the machine. In other words, we will determine how this machine behaves using itself for input.
This may seem strange, but consider the PALINDROME machine presented above. It can be encoded into (admittedly lengthy) serial number format. We could then run that machine on a UTM with its own serial number as the input. The UTM will then determine whether the serial number happens to BE a palindrome.
2. Send P#P into the H. H will determine whether the machine will eventually halt given its own serial number as input. (That would be good to know before we trying running it on a UTM!). The result will be boolean (Y or N).
3. If the result from H is Y,
        then send the machine into a deliberate infinite loop
    else
        force the machine to halt.

So, H determines if P is a program that halts with itself as input. If so, H' runs forever, but if P is a program that runs forever with itself as input, then H' stops. Before continuing, it might seem that this is a silly machine to construct. That may be true, but the point is, if H exists, then we CAN construct this silly machine, because we've previously demonstrated that it is trvial to copy a tape, it is trivial to go into a deliberate infinite loop and it is trivial to halt immediately.

Now, ..... here's the magic ......, what happens after we build H' and determine its serial number if we send H' into H' as x?

If H' will run forever on H' as input, then H' stops.
If H' will halt on H' as input, then H' runs for ever.

Umm ..... say that again????

Since all parts of H' are trivial except H, it follows that the initial assumption that H exists is incorrect.

2. KOLMOGOROV COMPLEXITY

Is a string of numbers random?

EXAMPLE 1:

0100 0100 0100 0100 0100 0100 0100 0100... 0100

This string is clearly non-random. It's 0100 repeated over and over.

A TM could generate n copies of the string with ease.

input tape:

n in unary notation as 1's # s # bbbbb

Algorithm:

erase a 1 or halt if not at a 1 move to the 1st # move right copy the string to the first blank position on the tape rewind to the first blank move right repeat

Pretty short program to build a very large string. Thus, the string is not random.

EXAMPLE 2:

0011 0001 0100 0001 0101 1001

Seems pretty random (it's too small to be of any interest, but one could imagine a similar larger string. Suppose our compression software doesn't compress it. Does that mean it's random? Or does it mean that we don't have the world's ULTIMATE COMPRESSOR .... Is it random? What do you think? Click for an answer

EXAMPLE 3:

1011 0101 0000 0100 1111 0011

Again, it seems pretty random. But, can you find a secret pattern? answer

EXAMPLE 4:

0000 1011 1100 0011 1101 0111 1011 1001 0111 1001 1111 1100

Can you find the pattern this time? answer

HOW DO WE JUDGE THE RANDOMNESS OF A STRING?

This is an important question since many encryption algorithms use randomly chosen prime numbers as a basis of encoding messages.

In the early 1960's, Soviet mathematician, Kolmogorov, described the randomness of a string in terms of the length of the smallest program and data set, P(x), which could generate the string, s. That is, P and its input, x, are given to a UTM and the result of UTM(P,x) = s.

For convenience we may think of P and x taken together as P' which is the smallest program which will generate s on a UTM.

Using mathematical symbolism,

K(s) = ( length(P') : P' is the smallest program which generates s )

The larger the value of K(s) the more incompressible, or random, s is. The value of K(s)/s is the percentage of compression that can be obtained with the best possible compressor.

QUESTION:

Do random strings exist, that is, are there strings that are not compressible?

This is a DECIDABLE question. There is a PROOF that shows INCOMPRESSIBLE STRINGS EXIST IN ABUNDANCE! The proof is beyond the scope of this course, but the results are striking. For example, the proof shows that of all the 100 bit strings that are possible, only about 1 string in 32 million can be compressed to 75% of its original size (or smaller). If we want to compress a this 100 bit string to 20% of its original size, then only about 1 string in one thousand billion trillion strings can be compressed this much!!!

QUESTION:

Can a program be designed which will examine a string, s, and determine whether it is one of the many random string? This is the same as asking it to determine K(s) since that's our metric for randomness.

THEOREM

No program exists to compute K(s).

Note: we can approximate K(s) by running our best compressor, but that doesn't guarantee that we've found the hidden best way to compress the string (remember the BCD pi example!)

PROOF

Again, by contradiction.

Assume that program P exists (as a TM, for example) to compute K(s). Use this powerful machine to do the following:

1. Design a simple COUNTER TM which successively outputs all 1-character strings, then all 2-character strings, all 3-character strings, etc. This is a trivial machine to construct.
2. Connect this machine to P as an input source.

The output is the "compressibility" or "randomness" of each of the strings in the universe of the alphabet. Most of those strings are random. A scarce few are not.
3. If the string is not "random" simply ask the counter to try the next one.

4. However, if the string is random then determine whether it is longer than the description of the TM that we're building right now. The final size of that TM (by some metric) will be known in a moment. If it's not longer than this program, simply ask the counter for the next one.
BUT ... magic time ... if it IS longer than this program, output the string.

Now, let's review.... What will this program do? It will output the first incompressible string longer than this program. But .... if the string is longer than this program, then THIS PROGRAM is a COMPRESSOR for the random (i.e. INCOMPRESSIBLE) STRING!
       done = false;
       counter = "";
       WHILE (!done)
       {
           counter = next(counter);  // generates, one at a time, all 1-bit strings, then
                                     // all 2-bit strings, then all 3-bit strings, etc.
           s = counter;
           Ks = P(s);
           IF (Ks < s.length)

               // string is compressible, do nothing

           ELSE

               // string is not compressible, thus it is RANDOM
               IF (s.length > lengthOfThisProgram)
               {
                  alert(s);
                  done = true;
               }
       }
So this program outputs a string (which is longer than this program) that is incompressible, but was just compressed by the use of this program.... Since all of the elements of the program are known to exist besides P, we must conclude that the initial assumption of P's existence was incorrect.